Disaster Recovery¶
Last updated: 2026-04-10
Scenario 1 — LXC is broken, snapshot revert works¶
Symptoms: Service is down, container won't start, config is corrupted.
Recovery time: ~2 minutes
# From Proxmox host
pct stop 101 # or 102
pct rollback 101 phase1-complete
pct start 101
# Verify
pct enter 101
cd /opt/edge-gateway
docker compose ps # All services should be Up
Note
After rollback, any changes made since the snapshot are lost. This includes n8n workflows created after the snapshot, NPM proxy host changes, and firewall rule edits.
Scenario 2 — LXC is broken, no usable snapshot¶
Symptoms: Snapshot is also corrupted, or snapshot was never taken.
Recovery time: ~30 minutes per container
Rebuild edge-gateway (CT 101)¶
-
Delete the broken container:
pct stop 101 pct destroy 101 -
Recreate from scratch following
Phase 1 Implementation Guide.mdSteps 3.1–3.4 -
Restore config files:
/opt/edge-gateway/docker-compose.yml— copy from Implementation Guide Step 3.4/opt/edge-gateway/.env— retrieveTUNNEL_TOKENfrom password manager or Cloudflare dashboard (Zero Trust → Tunnels → exzentcg-homelab → Configure → copy token)-
NPM proxy host config is stored in
/opt/edge-gateway/npm/data/— if this was backed up, restore it. If not, recreate the proxy host forn8n.exzentcg.commanually (Step 8) -
Fix DNS:
cat > /etc/resolv.conf <<'EOF' nameserver 1.1.1.1 nameserver 1.0.0.1 EOF chattr +i /etc/resolv.conf -
Start services and verify:
cd /opt/edge-gateway docker compose up -d docker compose logs --tail 30 cloudflared # Look for "Registered tunnel connection" lines -
Recreate firewall rules — copy
/etc/pve/firewall/101.fwfrom Phase 1 Actions.md Step 5.1
Rebuild n8n-app (CT 102)¶
-
Delete and recreate the container following Implementation Guide Steps 4.1–4.4
-
Restore config files:
/opt/n8n/docker-compose.yml— copy from Implementation Guide Step 4.4-
/opt/n8n/.env— retrieveN8N_ENCRYPTION_KEYfrom password manager -
Restore n8n data:
- If
/opt/n8n/data/was backed up, restore it andchown -R 1000:1000 ./data -
If not backed up, n8n starts fresh — all workflows, credentials, and the owner account are lost. You must redo the setup wizard.
-
Fix permissions and start:
cd /opt/n8n chown -R 1000:1000 ./data docker compose up -d -
Recreate firewall rules — copy
/etc/pve/firewall/102.fwfrom Phase 1 Actions.md Step 5.2
Scenario 3 — Proxmox host dies completely¶
Symptoms: Hardware failure, disk corruption, total loss.
Recovery time: ~2 hours
What you need: - A new machine (or repaired hardware) - Proxmox VE ISO (download from proxmox.com) - This Obsidian vault (stored on your laptop, not on the Proxmox host) - Access to your password manager
Steps:
- Install Proxmox VE fresh on the new hardware
- Set the management IP to
192.168.0.200(or update all references) - Install Tailscale:
curl -fsSL https://tailscale.com/install.sh | sh && tailscale up - Recreate Datacenter firewall IP sets (Step 1 of Implementation Guide)
- Enable Datacenter firewall (Step 2)
- Create node firewall rules (Step 0.2)
- Recreate CT 101 edge-gateway (Steps 3.1–3.4)
- Recreate CT 102 n8n-app (Steps 4.1–4.4)
- Apply container firewall rules (Step 5)
- Start cloudflared with stored tunnel token (Step 7.2–7.3)
- Recreate NPM proxy host (Step 8)
- Verify end-to-end:
https://n8n.exzentcg.com
Warning
Cloudflare-side config (tunnel, Access policies, DNS) survives a host death. You do NOT need to recreate the tunnel, DNS records, or Access applications. Only the on-premises infrastructure needs rebuilding.
Scenario 4 — N8N_ENCRYPTION_KEY lost¶
Symptoms: n8n starts but all credential nodes show errors. Workflows that use stored API keys/tokens fail.
Recovery: There is no recovery. The key is AES-256 — without it, the encrypted credential blobs in n8n's SQLite database are unreadable.
Mitigation:
1. Re-enter every credential manually in n8n
2. Re-test every workflow that uses credentials
3. Generate a new encryption key and update .env:
openssl rand -hex 32
nano /opt/n8n/.env # replace old key with new
docker compose restart n8n
Scenario 5 — Cloudflare Tunnel token compromised¶
Symptoms: Someone has your tunnel token and could potentially route traffic through your tunnel.
Recovery:
- Go to Cloudflare Zero Trust → Networks → Tunnels → exzentcg-homelab
- Rotate the tunnel token (or delete and recreate the tunnel)
- Copy the new token
- Update on edge-gateway:
pct enter 101 cd /opt/edge-gateway nano .env # replace TUNNEL_TOKEN value docker compose restart cloudflared docker compose logs --tail 30 cloudflared # Verify "Registered tunnel connection" appears - If you recreated the tunnel, you also need to re-add the public hostname route and update the DNS CNAME
Backup Strategy (recommended)¶
| What | How | Frequency |
|---|---|---|
| Proxmox LXC snapshots | pct snapshot <id> <name> |
After any significant change |
| n8n data directory | tar czf /root/n8n-backup-$(date +%F).tgz /opt/n8n/data/ from CT 102 |
Weekly or before n8n updates |
| NPM data directory | tar czf /root/npm-backup-$(date +%F).tgz /opt/edge-gateway/npm/ from CT 101 |
After proxy host changes |
| This Obsidian vault | Git repo or cloud sync (OneDrive, etc.) | Continuous |
| Password manager | Cloud-synced (Bitwarden, 1Password) | Continuous |